{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Ol-xbgTvlzz2"
   },
   "source": [
    "# DMS and Chlorophyll-A for CMAQ\n",
    "\n",
    "---\n",
    "    author: Barron H. Henderson\n",
    "    contributors: Brett Gantt, Jeff Willison, Golam Sarwar, and Sara Farrell\n",
    "    date: 2021-03-23\n",
    "    last updated: 2024-06-12\n",
    "---\n",
    "\n",
    "This notebook creates CMAQ-Ready input files necessary for CMAQ halogen and DMS. DMS chemistry requires DMS concentrations and halogen chemistry relies on chlorophyll concentrations. The Chlorophyll is extracted from NASA MODIS-Aqua level-3 data prodcuts. The DMS is created from monthly climatologies from the Surface Ocean and Lower Atmosphere (SOLAS) project.\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Specify User Input Options\n",
    "\n",
    "* User input options are described below.\n",
    "* Most users will update `dom`, `ocnintmpl`, `ocnouttmpl`, and `gdpath`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# dom : str\n",
    "#     Name of output domain. For example, 12US1, 36US3, 108NHEMI2. This is used\n",
    "#     to name outputs and inputs.\n",
    "dom = '12US1'\n",
    "\n",
    "# ocnintmpl : str\n",
    "#     Path to OCEAN file with OPEN (0-1) and SURF (0-1) variables. The path can\n",
    "#     use strftime templates to construct date specific paths. The strftime\n",
    "#     function described at https://strftime.net/. For 2016-01-01, here are a\n",
    "#     few examples: %F = 2016-01-01, %Y%j = 2016001, %m%d = 0101, %b = Jan,\n",
    "#     %^b = JAN. e.g., OCEAN_%Y%b.nc = OCEAN_2016Jan.nc\n",
    "ocnintmpl = f'/work/MOD3DATA/2016_{dom}/surface/{dom}_surf.ncf'\n",
    "\n",
    "# ocnouttmpl : str\n",
    "#     strftime template to create a new file. The new file will have ocnintmpl\n",
    "#     variables in addition to DMS and CHLO.\n",
    "ocnouttmpl = f'output/{dom}/OCEAN_%m_L3m_MC_CHL_chlor_a_{dom}.nc'\n",
    "\n",
    "# gdpath : str\n",
    "#    Path to an IOAPI file using the domain (dom). Most of the time you can use\n",
    "#    your ocean file. If your ocnintmpl is time varying (i.e.,  uses strftime),\n",
    "#    then you will need to update this to hard code a specific path. e.g.,\n",
    "#    gdpath = ocnintmpl.replace('%b', 'Jan')\n",
    "gdpath = ocnintmpl\n",
    "\n",
    "# overwrite : bool\n",
    "#     Default False, keep existing intermediate files. This is faster by a lot\n",
    "#     when redoing a domain, but uses cached results. If True, recreate all.\n",
    "overwrite = False\n",
    "\n",
    "# getlatestchlo : bool\n",
    "#     Default True, discover latest climatology urls from NASA server If False,\n",
    "#     use a prexisting list of known urls. Known urls can be any url that NetCDF\n",
    "#     can read (e.g, OpenDAP or local paths). They can also be month specific\n",
    "#     instead of climatology\n",
    "getlatestchlo = False"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "xGn3VnPDLF4j"
   },
   "source": [
    "# Install Prerequisites\n",
    "\n",
    "* On Google Colab or other web-based platforms, you may need to install some non-standard libraries\n",
    "* To do this, the notebook will use a combination of `apt-get` or `miniconda`, and  `pip`\n",
    "* Both can be run from within this notebook by specifying options below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "kEMW45MRTS2b"
   },
   "outputs": [],
   "source": [
    "installprereq = True\n",
    "installcdo = False\n",
    "# Use a preinstalled version of cdo in the user path\n",
    "cdopath = 'cdo'\n",
    "# Or specify a specific path. For example, at EPA use :\n",
    "# cdopath = '/work/ROMO/anaconda_envs/cdo-1.9.8/bin/cdo'\n",
    "# If you intall cdo with miniconda\n",
    "# cdopath = './miniconda/bin/cdo'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Oqy0eNJVN_Bi"
   },
   "source": [
    "## Install Climate Data Operators\n",
    "\n",
    "* If you do not have climate data operators, this can install them for you.\n",
    "    * There are two options, the first is `apt-get install cdo` which works on many Debian based linux systems.\n",
    "    * The second is more robust and installs from Anaconda.\n",
    "    * You can also install them yourself.\n",
    "* This may take a couple minutes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "_Rj0_8nn93gD"
   },
   "outputs": [],
   "source": [
    "import os\n",
    "if installcdo:\n",
    "    cdopath = './miniconda/bin/cdo'\n",
    "    if not os.path.exists(cdopath):\n",
    "        !wget -N -q https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh\n",
    "        !bash ./Miniconda3-latest-Linux-x86_64.sh  -b -p ./miniconda &> log.miniconda\n",
    "        !./miniconda/bin/conda install -c conda-forge -q -y 'cdo~=1.9' &> log.cdo"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "dVl1jAM8-71q"
   },
   "source": [
    "## pip\n",
    "\n",
    "* Install required prerequsites\n",
    "* This will be faster than installing cdo."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "-vhPJkCM97se",
    "outputId": "9733bd90-1a69-4001-f6c4-3a9e70559790"
   },
   "outputs": [],
   "source": [
    "if installprereq:\n",
    "    # If on Google Colab:\n",
    "    # 1. Copy the requirements file to this folder,\n",
    "    # 2. remove --user\n",
    "    !python -m pip install -q --user -r requirements.txt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "aY78jFw5LN_k"
   },
   "source": [
    "## Now restart the Runtime\n",
    "\n",
    "* Optional.\n",
    "* Click on the Runtime menu.\n",
    "* Click `Restart Runtime`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Bb0x3o4COLsN"
   },
   "source": [
    "# Quick Run\n",
    "\n",
    "* From here, you can click Runtime, Run after\n",
    "* This will run everything and then give you a dialog to download the results."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "LtBUDryDLU5P"
   },
   "source": [
    "# Import libraries\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "1Za8quom-Jw5"
   },
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "importsuccess = False\n",
    "from urllib.request import urlretrieve\n",
    "import os\n",
    "from datetime import datetime\n",
    "from glob import glob\n",
    "import zipfile\n",
    "import warnings\n",
    "\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "import pandas as pd\n",
    "\n",
    "import cdo\n",
    "import pycno\n",
    "import PseudoNetCDF as pnc\n",
    "\n",
    "os.environ['IOAPI_ISPH'] = '6370000.'\n",
    "warnings.simplefilter('ignore')\n",
    "importsuccess = True"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "FoVLIdQSBJSp"
   },
   "source": [
    "## Prepare Climate Data Operators\n",
    "\n",
    "* Used for spatial interpolation\n",
    "* Instantiate an operator object (`cdoo`) that will be used in the rest of the project.\n",
    "* Set `debug=True` for detailed feedback"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "48wbKUlv_p-Q"
   },
   "outputs": [],
   "source": [
    "%pdb off\n",
    "cdoo = cdo.Cdo(cdopath)\n",
    "cdoo.setCdo(cdopath)\n",
    "print('CDI version', cdoo.version())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "jm97a_WSBTyK"
   },
   "source": [
    "# Define the CMAQ Grid\n",
    "\n",
    "Script will create folders for dom, open file and create a CDO grid mapping definition."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "PNJHfh9F_lqN"
   },
   "outputs": [],
   "source": [
    "os.makedirs('cdogrids', exist_ok=True)\n",
    "os.makedirs(f'output/{dom}', exist_ok=True)\n",
    "os.makedirs(f'chlor_a/{dom}', exist_ok=True)\n",
    "os.makedirs(f'dmsclimatology/{dom}', exist_ok=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "MNbuxMR53G41"
   },
   "outputs": [],
   "source": [
    "gdf = pnc.pncopen(gdpath, format='ioapi')\n",
    "proj = gdf.getproj()\n",
    "gproj = gdf.getproj(withgrid=True)\n",
    "crs = proj.crs.to_cf()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "weQTj7odsz43"
   },
   "outputs": [],
   "source": [
    "cdogrid = f'cdogrids/{dom}.cdo'\n",
    "with open(cdogrid, 'w') as cdogf:\n",
    "    cdogf.write(f\"\"\"\n",
    "gridtype = projection\n",
    "gridsize = {gdf.NROWS * gdf.NCOLS}\n",
    "xname = COL\n",
    "yname = ROW\n",
    "xsize = {gdf.NCOLS}\n",
    "ysize = {gdf.NROWS}\n",
    "xinc = {gdf.XCELL}\n",
    "xfirst = {gdf.XCELL / 2:.0f}\n",
    "yinc = {gdf.YCELL}\n",
    "yfirst = {gdf.YCELL / 2:.0f}\n",
    "grid_mapping_name = {crs['grid_mapping_name']}\n",
    "longitude_of_projection_origin = {gdf.XCENT}\n",
    "latitude_of_projection_origin = {gdf.YCENT}\n",
    "\"\"\")\n",
    "    for k, v in crs.items():\n",
    "        cdogf.write(f\"{k} = {v}\\n\".replace('(', '').replace(')', ''))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "4gE-JC-YLgGO"
   },
   "source": [
    "# DMS Processing\n",
    "\n",
    "* Surface Ocean Lower Atmospheric Study (SOLAS)\n",
    "* Created a climatology of DMS (under short-lived species)\n",
    "* https://www.bodc.ac.uk/solas_integration/\n",
    "\n",
    "\n",
    "* Steps:\n",
    "  1. Download the data\n",
    "  2. Extract the CSV files\n",
    "  3. Create a netCDF file with known longitude and latitude.\n",
    "  4. Visualize\n",
    "  5. Regrid DMS"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "f1SH2iGIAs7p"
   },
   "source": [
    "## Download DMS Climatology\n",
    "\n",
    "* Downloads file to dmsclimatology folder.\n",
    "* If you prefer, download it there yourself manually."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "Dsxsj5-zliXT"
   },
   "outputs": [],
   "source": [
    "dmsurl = 'https://www.bodc.ac.uk/solas_integration/implementation_products/group1/dms/documents/dmsclimatology.zip'\n",
    "dmsdest = 'dmsclimatology/dmsclimatology.zip'\n",
    "if not os.path.exists(dmsdest):\n",
    "    urlretrieve(dmsurl, dmsdest, reporthook=lambda c, s, t: print(f'\\r {min(1, c*s/t):5.1%}', end='', flush=True))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "SQKxTJwyLtZW"
   },
   "source": [
    "## Extract CSV"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "1KUkmDBclsOt"
   },
   "outputs": [],
   "source": [
    "zf = zipfile.ZipFile(dmsdest)\n",
    "zf.extractall(path='dmsclimatology')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "BhLvFO2ALwGf"
   },
   "source": [
    "## Create a NetCDF"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "rbf5_7XOlwi9"
   },
   "outputs": [],
   "source": [
    "dmsncpath = 'dmsclimatology/dmsconcentration.nc'\n",
    "if overwrite or not os.path.exists(dmsncpath):\n",
    "    if os.path.exists(dmsncpath):\n",
    "        os.remove(dmsncpath)\n",
    "\n",
    "    dmsfile = pnc.PseudoNetCDFFile()\n",
    "    dmsfile.createDimension('time', 12)\n",
    "    dmsfile.createDimension('latitude', 180)\n",
    "    dmsfile.createDimension('longitude', 360)\n",
    "\n",
    "    timev = dmsfile.createVariable('time', 'd', ('time',))\n",
    "    refdate = datetime(2000, 1, 1)\n",
    "    middate = datetime(2000, 1, 15)\n",
    "    timev[:] = [(middate.replace(month=i) - refdate).total_seconds() / 3600 / 24 for i in range(1, 13)]\n",
    "    timev.units = 'days since 2000-01-01'\n",
    "    timev.long_name = 'time'\n",
    "\n",
    "    lonv = dmsfile.createVariable('longitude', 'd', ('longitude',))\n",
    "    lonv[:] = np.linspace(-179.5, 179.5, 360)\n",
    "    lonv.units = 'degrees_east'\n",
    "    lonv.long_name = 'longitude'\n",
    "\n",
    "    latv = dmsfile.createVariable('latitude', 'd', ('latitude',))\n",
    "    latv[:] = np.linspace(-89.5, 89.5, 180)\n",
    "    latv.units = 'degrees_north'\n",
    "    latv.long_name = 'latitude'\n",
    "\n",
    "    dmsfile.setCoords(['time', 'longitude', 'latitude'])\n",
    "\n",
    "    dmsv = dmsfile.createVariable('DMS', 'f', ('time', 'latitude', 'longitude'), fill_value=-999)\n",
    "    dmsv.units = 'nM'.ljust(16)\n",
    "    dmsv.long_name = 'DMS'.ljust(16)\n",
    "    dmsv.description = \"Hansell et al. seawater DMS climatology\".ljust(80)\n",
    "\n",
    "    for ti, monthname in enumerate(['JAN', 'FEB', 'MAR', 'APR', 'MAY', 'JUN', 'JUL', 'AUG', 'SEP', 'OCT', 'NOV', 'DEC']):\n",
    "        dmsdata = pd.read_csv(f'dmsclimatology/DMSclim_{monthname}.csv', names=np.linspace(-179.5, 179.5, 360), na_values=['NaN'])\n",
    "        dmsdata.set_index(np.linspace(-89.5, 89.5, 180)[::-1], inplace=True)\n",
    "        dmsv[ti] = dmsdata.values[::-1, ]\n",
    "        dmsv[np.isnan(dmsv[:])] = np.ma.masked\n",
    "\n",
    "    dmsfile.save(dmsncpath, format='NETCDF4_CLASSIC', verbose=0).close()\n",
    "else:\n",
    "    print('Keeping file', dmsncpath)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "TRywvdIwHkA3"
   },
   "source": [
    "## Now regrid DMS to model domain\n",
    "\n",
    "* `overwrite` is set to False, so if you re-run, it will keep old outputs. \n",
    "* `usesetmisstonn` is set to true, this interpolates valid values where data is missing.\n",
    "  * This is super helpful for missing data.\n",
    "  * This creates DMS overland.\n",
    "  * If `usesetmisstonn`, you'll want to mask out overland."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "t0AxBqGyuPBk"
   },
   "outputs": [],
   "source": [
    "# Choose one or neither\n",
    "usesetmisstonn = False\n",
    "usefillmiss = False\n",
    "\n",
    "dmsoutpath = f'dmsclimatology/{dom}/dmsconcentration.{dom}.nc'\n",
    "if overwrite or not os.path.exists(dmsoutpath):\n",
    "    if os.path.exists(dmsoutpath):\n",
    "        os.remove(dmsoutpath)\n",
    "    if usesetmisstonn:\n",
    "        cdoo.setmisstoc(f'0 -remapycon,{cdogrid} -setctomiss,-999. -setmisstonn', input=dmsncpath, output=dmsoutpath, returnCdf=False)\n",
    "    elif usefillmiss:\n",
    "        cdoo.setmisstoc(f'0 -remapycon,{cdogrid} -setctomiss,-999. -fillmiss', input=dmsncpath, output=dmsoutpath, returnCdf=False)\n",
    "    else:\n",
    "        cdoo.setmisstoc(f'0 -remapycon,{cdogrid} -setctomiss,-999.', input=dmsncpath, output=dmsoutpath, returnCdf=False)\n",
    "else:\n",
    "    print('Keeping file', dmsoutpath)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "2B_P_LYUL-yQ"
   },
   "source": [
    "# Process Monthly Chlorophyll-A\n",
    "\n",
    "* This tutorial uses climatalogical Chlorophyll\n",
    "* At the download step, you can switch to year-specific by following special instructions\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Cp1t3kMlURg4"
   },
   "source": [
    "## Download Monthly Chlorophyll-A\n",
    "\n",
    "* This tutorial uses climatalogical Chlorophyll\n",
    "* If this is okay for your project, just run the next cell without any edits.\n",
    "* You can update it to use year-specific values\n",
    "  * Go to https://oceancolor.gsfc.nasa.gov/l3/\n",
    "  * Choose Standard Product, MODIS-Aqua, Chlorophyll Concentration, Monthly, 9km\n",
    "  * Click Extract or Download\n",
    "  * Choose \"Mapped\" and Click \"Download\"\n",
    "  * Copy the urls from the webpage list and paste over the results below.\n",
    "  * Replace the cgi url with the opendap url\n",
    "    * replace \"https://oceandata.sci.gsfc.nasa.gov/cgi/getfile/\"\n",
    "    * with \"https://oceandata.sci.gsfc.nasa.gov:443/opendap/MODISA/L3SMI/%Y/%m%d/\n",
    "    * where %Y, %m, and %d are the year, month and day in the file names. (e.g., 20020701)\n",
    "    * AQUA_MODIS.%Y%m%d_%Y%m%d.L3m.MO.CHL.chlor_a.9km.nc\n",
    "  * For example, the original URL https://oceandata.sci.gsfc.nasa.gov/cgi/getfile/AQUA_MODIS.20020701_20020731.L3m.MO.CHL.chlor_a.9km.nc becomes https://oceandata.sci.gsfc.nasa.gov:443/opendap/MODISA/L3SMI/2002/0701/AQUA_MODIS.20020701_20020731.L3m.MO.CHL.chlor_a.9km.nc\n",
    "\n",
    "* You can also use files that you have already downloaded, by setting urls to point to those files.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "if getlatestchlo:\n",
    "    from urllib.request import urlopen\n",
    "    import re\n",
    "\n",
    "    urls = []\n",
    "    webroot = 'https://oceandata.sci.gsfc.nasa.gov:443/opendap/MODISA/L3SMI/'\n",
    "    for prefix in ['2002/0701', '2002/0801', '2002/0901', '2002/1001', '2002/1101', '2002/1201', '2003/0101', '2003/0201', '2003/0301', '2003/0401', '2003/0501', '2003/0601']:\n",
    "        htmlout = urlopen(webroot + prefix)\n",
    "        htmltxt = htmlout.read().decode()\n",
    "        mostrecent = sorted(re.compile('(?<=>).+L3m_MC_CHL_chlor_a_9km.nc(?=</)').findall(htmltxt))[-1]\n",
    "        urls.append(webroot + prefix + '/' + mostrecent)\n",
    "    \n",
    "    urls = '\\n'.join(urls)\n",
    "else:\n",
    "    urls = \"\"\"\n",
    "https://oceandata.sci.gsfc.nasa.gov:443/opendap/MODISA/L3SMI/2002/0701/AQUA_MODIS.20020701_20020731.L3m.MO.CHL.chlor_a.9km.nc\n",
    "https://oceandata.sci.gsfc.nasa.gov:443/opendap/MODISA/L3SMI/2002/0801/AQUA_MODIS.20020801_20020831.L3m.MO.CHL.chlor_a.9km.nc\n",
    "https://oceandata.sci.gsfc.nasa.gov:443/opendap/MODISA/L3SMI/2002/0901/AQUA_MODIS.20020901_20020930.L3m.MO.CHL.chlor_a.9km.nc\n",
    "https://oceandata.sci.gsfc.nasa.gov:443/opendap/MODISA/L3SMI/2002/1001/AQUA_MODIS.20021001_20021031.L3m.MO.CHL.chlor_a.9km.nc\n",
    "https://oceandata.sci.gsfc.nasa.gov:443/opendap/MODISA/L3SMI/2002/1101/AQUA_MODIS.20021101_20021130.L3m.MO.CHL.chlor_a.9km.nc\n",
    "https://oceandata.sci.gsfc.nasa.gov:443/opendap/MODISA/L3SMI/2002/1201/AQUA_MODIS.20021201_20021231.L3m.MO.CHL.chlor_a.9km.nc\n",
    "https://oceandata.sci.gsfc.nasa.gov:443/opendap/MODISA/L3SMI/2003/0101/AQUA_MODIS.20030101_20030131.L3m.MO.CHL.chlor_a.9km.nc\n",
    "https://oceandata.sci.gsfc.nasa.gov:443/opendap/MODISA/L3SMI/2003/0201/AQUA_MODIS.20030201_20030228.L3m.MO.CHL.chlor_a.9km.nc\n",
    "https://oceandata.sci.gsfc.nasa.gov:443/opendap/MODISA/L3SMI/2003/0301/AQUA_MODIS.20030301_20030331.L3m.MO.CHL.chlor_a.9km.nc\n",
    "https://oceandata.sci.gsfc.nasa.gov:443/opendap/MODISA/L3SMI/2003/0401/AQUA_MODIS.20030401_20030430.L3m.MO.CHL.chlor_a.9km.nc\n",
    "https://oceandata.sci.gsfc.nasa.gov:443/opendap/MODISA/L3SMI/2003/0501/AQUA_MODIS.20030501_20030531.L3m.MO.CHL.chlor_a.9km.nc\n",
    "https://oceandata.sci.gsfc.nasa.gov:443/opendap/MODISA/L3SMI/2003/0601/AQUA_MODIS.20030601_20030630.L3m.MO.CHL.chlor_a.9km.nc\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "_VOtxSghzNZO",
    "outputId": "e0d05c32-a164-42c6-8de6-9f1e9d5369bd"
   },
   "outputs": [],
   "source": [
    "print(urls)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "kfTWMJ3UHBfS"
   },
   "source": [
    "## Now regrid the Chlorophyll-A\n",
    "\n",
    "* `overwrite` is set to false, so if you re-run, it will keep old outputs. \n",
    "* `usefill` is set to true, this interpolates valid values where data is missing.\n",
    "  * This is super helpful for missing data.\n",
    "  * This creates Chlorophyll-A overland.\n",
    "  * If `usefill`, you'll want to mask out overland using an OCEAN file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "XZVMxc4mB1pl",
    "outputId": "f6f00269-2b32-4c0f-9617-1b9a5adedf46"
   },
   "outputs": [],
   "source": [
    "# Filling missing values over land. This can look quite weird when places\n",
    "# like the Utah Salt Lake create spikes that persist over land.\n",
    "# When you are going to mask with LAND, this is useful.\n",
    "#\n",
    "# * usefill must be  performed globally, because fillmiss is not supported\n",
    "#   for the projections... the global application makes it unnecessarily slow.\n",
    "# * usesetmisstonn is perfomed on the regional grid, so it is fast.\n",
    "usefill = False\n",
    "useglobalsetmisstonn = False\n",
    "usesetmisstonn = True\n",
    "for chlinpath in urls.split():\n",
    "    tmppath = os.path.join('chlor_a', os.path.basename(chlinpath))\n",
    "    chloutpath = os.path.join('chlor_a', dom, os.path.basename(chlinpath.replace('9km', dom)))\n",
    "    print('Input', chlinpath)\n",
    "    print('Output', chloutpath)\n",
    "\n",
    "    if overwrite or not os.path.exists(tmppath):\n",
    "        print('Downloading..', end='.', flush=True)\n",
    "        if os.path.exists(tmppath):\n",
    "            os.remove(tmppath)\n",
    "        if usefill:\n",
    "            cdoo.fillmiss(' -selvar,chlor_a', input=chlinpath, output=tmppath, returnCdf=False)\n",
    "        elif useglobalsetmisstonn:\n",
    "            cdoo.setmisstonn(' -selvar,chlor_a', input=chlinpath, output=tmppath, returnCdf=False)\n",
    "        else:\n",
    "            cdoo.selvar('chlor_a', input=chlinpath, output=tmppath, returnCdf=False)\n",
    "    else:\n",
    "        print('Keeping existing..', end='.', flush=True)\n",
    "\n",
    "    if overwrite or not os.path.exists(chloutpath):\n",
    "        print('Regridding...', flush=True)\n",
    "        if os.path.exists(chloutpath):\n",
    "            os.remove(chloutpath)\n",
    "        if usesetmisstonn:\n",
    "            cdoo.setmisstonn(f' -remapycon,{cdogrid} -setctomiss,-32767.', input=tmppath, output=chloutpath)\n",
    "        else:\n",
    "            cdoo.setmisstoc(f'0 -remapycon,{cdogrid} -setctomiss,-32767.', input=tmppath, output=chloutpath)\n",
    "    else:\n",
    "        print('Keeping existing')\n",
    "print('Done')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Bn4xT1BaDv0W"
   },
   "source": [
    "# Combine DMS and Chlorophyll-A in a CMAQ-ready File\n",
    "\n",
    "* OPEN and SURF will be taken from 1 OCEAN file.\n",
    "* DMS will be taken from 12-monthly representative days.\n",
    "* CHLO will be taken from 12-monthly representative files.\n",
    "* Data will be visualized to confirm.\n",
    "\n",
    "\n",
    "* Steps\n",
    "    * Combine\n",
    "    * Visualize"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "vxQlMPQtMeai"
   },
   "source": [
    "## Combine DMS and Chlorophyll-A\n",
    "\n",
    "* DMS and CHLO will be added to monthly copies of the OCEAN file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "jyVNghUkhsNw",
    "outputId": "fc5100b4-28e6-4bad-d672-c2fa95841e97"
   },
   "outputs": [],
   "source": [
    "dmsf = pnc.pncopen(f'dmsclimatology/{dom}/dmsconcentration.{dom}.nc', format='netcdf')\n",
    "ocnoutpaths = []\n",
    "for chloutpath in sorted(glob(f'chlor_a/{dom}/AQUA*{dom}.nc')):\n",
    "    print(chloutpath)\n",
    "    mydate = datetime.strptime(os.path.basename(chloutpath)[11:19], '%Y%m%d')\n",
    "    # seaicepath = mydate.strftime(f'output/{dom}/SEAICE_2020%m01_CLIM.nc')\n",
    "    ocnoutpath = mydate.strftime(ocnouttmpl)\n",
    "    ocnoutpaths.append(ocnoutpath)\n",
    "    ocnpath = mydate.strftime(ocnintmpl)\n",
    "    ocnf = pnc.pncopen(ocnpath, format='ioapi')\n",
    "    if not overwrite and os.path.exists(ocnoutpath):\n",
    "        print('Keeping', ocnoutpath)\n",
    "        continue\n",
    "    if os.path.exists(ocnoutpath):\n",
    "        os.remove(ocnoutpath)\n",
    "\n",
    "    ti = mydate.month - 1\n",
    "    print(ti, mydate)\n",
    "    dmstimef = dmsf.slice(time=ti)\n",
    "    dmsinv = dmstimef.variables['DMS']\n",
    "    chlf = pnc.pncopen(chloutpath, format='netcdf')\n",
    "    chlinv = chlf.variables['chlor_a']\n",
    "    outf = pnc.cmaqfiles.ioapi_base.from_ncf(ocnf)\n",
    "    dmsoutv = outf.copyVariable(dmsinv, key='DMS', dtype='f', dimensions=('TSTEP', 'LAY', 'ROW', 'COL'))\n",
    "    dmsoutv.var_desc = 'DMS'.ljust(80)\n",
    "    dmsoutv.units = dmsinv.units[:16].ljust(16)\n",
    "    chloutv = outf.copyVariable(chlinv, key='CHLO', dtype='f', dimensions=('TSTEP', 'LAY', 'ROW', 'COL'))\n",
    "    chloutv.var_desc = chloutv.long_name[:80].ljust(80)\n",
    "    chloutv.long_name = 'CHLO'.ljust(16)\n",
    "    chloutv.units = chlinv.units[:16].ljust(16)\n",
    "\n",
    "    island = (ocnf.variables['OPEN'][:] + ocnf.variables['SURF']) == 0\n",
    "    dmsoutv[island] = 0\n",
    "    chloutv[island] = 0\n",
    "\n",
    "    # When chlor_a is masked, the values are missing. Usually, this is due to seaice\n",
    "    # However, seaice may not perfectly match. If remaining missing values are present\n",
    "    # Set them to zero.\n",
    "    dmsoutv[np.ma.getmaskarray(chloutv[:])] = 0\n",
    "    chloutv[np.ma.getmaskarray(chloutv[:])] = 0\n",
    "\n",
    "\n",
    "    outf.SDATE = 2000001\n",
    "    outf.TSTEP = 10000\n",
    "    outf.updatemeta()\n",
    "    outf.updatetflag(overwrite=True)\n",
    "    outf.SDATE = -635\n",
    "    outf.TSTEP = 0\n",
    "    outf.variables['TFLAG'][:, :, :] = 0\n",
    "    outf.FILEDESC = outf.FILEDESC.strip() + (\n",
    "        f\"; OCEAN file {ocnpath}\\n\"\n",
    "        + f\"DMS added from SOLAS project after regridding and gap filling\\n({dmsurl})\\n\"\n",
    "        + f\"CHLO added from MODIS after regridding and gap filling\\n{chloutpath}\\n\"\n",
    "    )\n",
    "    outf.save(ocnoutpath, format='NETCDF3_CLASSIC', complevel=1, verbose=0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "m6YPF65mMrW2"
   },
   "source": [
    "## Print CMAQ-ready File Description\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "eq5Kfd2m9uyK",
    "outputId": "cdae52d3-2bf3-4563-bdb5-9ca0bb7f4b50"
   },
   "outputs": [],
   "source": [
    "outf = pnc.pncopen(ocnoutpath, format='ioapi')\n",
    "print(outf.FILEDESC)\n",
    "print(outf.HISTORY)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "J1RPPLYqhHXX"
   },
   "source": [
    "# Download CMAQ-Ready Files\n",
    "\n",
    "If you are on a cloud processing system, you may want to download the files for reuse on another platform."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "dLE6gkWBgbVq",
    "outputId": "a6db4df5-835b-430d-e7ba-41889a6cdb1f"
   },
   "outputs": [],
   "source": [
    "with zipfile.ZipFile('downloaddmschlo.zip', 'w') as zf:\n",
    "    print(ocnpath)\n",
    "    zf.write(ocnpath)\n",
    "    for ocnoutpath in ocnoutpaths:\n",
    "        print(ocnoutpath)\n",
    "        zf.write(ocnoutpath)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "JIO18c2i3DnW"
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "colab": {
   "collapsed_sections": [],
   "name": "CMAQ_DMS_ChlorA.ipynb",
   "provenance": []
  },
  "kernelspec": {
   "display_name": "geospatial_rh8",
   "language": "python",
   "name": "geospatial_rh8"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.15"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
